{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "4bb87b4e-7866-46e6-8214-7c9928a792a3",
   "metadata": {},
   "source": [
    "# Chapter 3 - 深入看看 Transformer LLM (Looking Inside Transformer LLMs)\n",
    "\n",
    "## 3.1  加载 model 和 tokenizer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ceca5771",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%capture\n",
    "# !pip install transformers>=4.41.2 accelerate>=0.31.0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "9f316e52-aebd-4165-8a4b-fd9d88e82dea",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-10-11T08:53:04.810236Z",
     "iopub.status.busy": "2024-10-11T08:53:04.809781Z",
     "iopub.status.idle": "2024-10-11T08:53:09.995918Z",
     "shell.execute_reply": "2024-10-11T08:53:09.995301Z",
     "shell.execute_reply.started": "2024-10-11T08:53:04.810211Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
     ]
    }
   ],
   "source": [
    "from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline\n",
    "\n",
    "\n",
    "model_name = \"Qwen/Qwen2.5-0.5B-Instruct\"\n",
    "# Load model and tokenizer\n",
    "tokenizer = AutoTokenizer.from_pretrained(model_name)\n",
    "\n",
    "model = AutoModelForCausalLM.from_pretrained(\n",
    "    model_name,\n",
    "    device_map=\"cuda\",\n",
    "    torch_dtype=\"auto\",\n",
    "    trust_remote_code=True,\n",
    ")\n",
    "\n",
    "# Create a pipeline\n",
    "generator = pipeline(\n",
    "    \"text-generation\",\n",
    "    model=model,\n",
    "    tokenizer=tokenizer,\n",
    "    return_full_text=False,\n",
    "    max_new_tokens=50,\n",
    "    do_sample=False,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "026ca9c0-9aaf-475b-8111-f5132c1c752b",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-10-11T08:56:12.403160Z",
     "iopub.status.busy": "2024-10-11T08:56:12.402695Z",
     "iopub.status.idle": "2024-10-11T08:56:13.404874Z",
     "shell.execute_reply": "2024-10-11T08:56:13.404335Z",
     "shell.execute_reply.started": "2024-10-11T08:56:12.403124Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "（ ） A. 李白 B. 白居易 C. 苏轼 D. 王安石\n",
      "李白\n",
      "\n",
      "“春眠不觉晓，处处闻啼鸟。夜来风雨声，花落知多少\n"
     ]
    }
   ],
   "source": [
    "prompt = \"春风又绿江南岸 是谁写的？\"\n",
    "output = generator(prompt)\n",
    "\n",
    "print(output[0]['generated_text'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f119cf6d-7e32-4387-9036-fda91d428df0",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-10-11T08:58:44.961035Z",
     "iopub.status.busy": "2024-10-11T08:58:44.960572Z",
     "iopub.status.idle": "2024-10-11T08:58:44.964356Z",
     "shell.execute_reply": "2024-10-11T08:58:44.963816Z",
     "shell.execute_reply.started": "2024-10-11T08:58:44.960998Z"
    },
    "tags": []
   },
   "source": [
    "### 备注 \n",
    "text-generation 仅仅是预测下一个token，所以相对于 如果没有构造成 [chat] 类的格式，效果会更差一些"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "ec9ee6b5-6f28-4b89-be78-a461360e839b",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-10-11T08:59:04.296905Z",
     "iopub.status.busy": "2024-10-11T08:59:04.296413Z",
     "iopub.status.idle": "2024-10-11T08:59:04.303834Z",
     "shell.execute_reply": "2024-10-11T08:59:04.303185Z",
     "shell.execute_reply.started": "2024-10-11T08:59:04.296867Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Qwen2ForCausalLM(\n",
      "  (model): Qwen2Model(\n",
      "    (embed_tokens): Embedding(151936, 896)\n",
      "    (layers): ModuleList(\n",
      "      (0-23): 24 x Qwen2DecoderLayer(\n",
      "        (self_attn): Qwen2SdpaAttention(\n",
      "          (q_proj): Linear(in_features=896, out_features=896, bias=True)\n",
      "          (k_proj): Linear(in_features=896, out_features=128, bias=True)\n",
      "          (v_proj): Linear(in_features=896, out_features=128, bias=True)\n",
      "          (o_proj): Linear(in_features=896, out_features=896, bias=False)\n",
      "          (rotary_emb): Qwen2RotaryEmbedding()\n",
      "        )\n",
      "        (mlp): Qwen2MLP(\n",
      "          (gate_proj): Linear(in_features=896, out_features=4864, bias=False)\n",
      "          (up_proj): Linear(in_features=896, out_features=4864, bias=False)\n",
      "          (down_proj): Linear(in_features=4864, out_features=896, bias=False)\n",
      "          (act_fn): SiLU()\n",
      "        )\n",
      "        (input_layernorm): Qwen2RMSNorm()\n",
      "        (post_attention_layernorm): Qwen2RMSNorm()\n",
      "      )\n",
      "    )\n",
      "    (norm): Qwen2RMSNorm()\n",
      "  )\n",
      "  (lm_head): Linear(in_features=896, out_features=151936, bias=False)\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "print(model)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a1d4e00-0c27-4841-8157-59bf403bf2f3",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-10-11T09:01:20.989997Z",
     "iopub.status.busy": "2024-10-11T09:01:20.989510Z",
     "iopub.status.idle": "2024-10-11T09:01:20.996292Z",
     "shell.execute_reply": "2024-10-11T09:01:20.995326Z",
     "shell.execute_reply.started": "2024-10-11T09:01:20.989961Z"
    },
    "tags": []
   },
   "source": [
    "## 3.2 面试要点？\n",
    "- RMSNorm 和 layernorm 的区别？\n",
    "> RMSNorm 可学习参数少于 layernorm,计算量更小。\n",
    "> RMSNorm 只有缩放操作，没有 recenter 的操作。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "718f3535-6a5d-4e08-8937-b71d6476e33b",
   "metadata": {},
   "source": [
    "## 3.3 查看一个 token 的概率分布（采样和解码）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "1aacff57-4c66-42a4-a88b-3320e5f647ba",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-10-11T09:03:31.183503Z",
     "iopub.status.busy": "2024-10-11T09:03:31.183038Z",
     "iopub.status.idle": "2024-10-11T09:03:31.238394Z",
     "shell.execute_reply": "2024-10-11T09:03:31.237886Z",
     "shell.execute_reply.started": "2024-10-11T09:03:31.183467Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "prompt = \"The capital of France is\"\n",
    "\n",
    "input_ids = tokenizer(prompt, return_tensors=\"pt\").input_ids\n",
    "\n",
    "input_ids = input_ids.to(\"cuda\")\n",
    "\n",
    "# Get the output of the model before the lm_head\n",
    "model_output = model.model(input_ids)\n",
    "\n",
    "# Get the output of the lm_head\n",
    "lm_head_output = model.lm_head(model_output[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "b3740b3a-30c4-4771-9869-dad58082396a",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-10-11T09:03:32.599936Z",
     "iopub.status.busy": "2024-10-11T09:03:32.599582Z",
     "iopub.status.idle": "2024-10-11T09:03:32.603959Z",
     "shell.execute_reply": "2024-10-11T09:03:32.603211Z",
     "shell.execute_reply.started": "2024-10-11T09:03:32.599917Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "' Paris'"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "token_id = lm_head_output[0, -1].argmax(-1)\n",
    "tokenizer.decode(token_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "d3730ac6-ba60-43b5-bbf0-b51cb15eb5d0",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-10-11T09:09:10.593567Z",
     "iopub.status.busy": "2024-10-11T09:09:10.593014Z",
     "iopub.status.idle": "2024-10-11T09:09:10.599657Z",
     "shell.execute_reply": "2024-10-11T09:09:10.599152Z",
     "shell.execute_reply.started": "2024-10-11T09:09:10.593514Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[ 785, 6722,  315, 9625,  374]], device='cuda:0')"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "input_ids"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "27ad6b6a-3c4f-4210-bf05-973deb1dfb75",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-10-11T09:08:38.621677Z",
     "iopub.status.busy": "2024-10-11T09:08:38.621192Z",
     "iopub.status.idle": "2024-10-11T09:08:38.628371Z",
     "shell.execute_reply": "2024-10-11T09:08:38.627777Z",
     "shell.execute_reply.started": "2024-10-11T09:08:38.621639Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(torch.Size([1, 5, 896]), torch.Size([1, 5, 151936]))"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 896 是模型的 config.hidden_state\n",
    "model_output[0].shape, lm_head_output.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "99f7d4b6-dce4-40fb-b141-a63885f5a9e9",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "eb27cf0d-7c1e-4784-8c66-a6446f888cf3",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-10-11T09:09:52.210657Z",
     "iopub.status.busy": "2024-10-11T09:09:52.210124Z",
     "iopub.status.idle": "2024-10-11T09:09:52.215901Z",
     "shell.execute_reply": "2024-10-11T09:09:52.214703Z",
     "shell.execute_reply.started": "2024-10-11T09:09:52.210635Z"
    },
    "tags": []
   },
   "source": [
    "### 备注 \n",
    "这里有两个 API， model.model() 和 model.lm_head()\n",
    "> model.model 是获取每一个 token hidden state, lm_head 是做 softmax 判断每一词是什么？ "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9ecf33e6-3e13-4e5d-b506-225d8189b263",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-10-11T09:11:31.201890Z",
     "iopub.status.busy": "2024-10-11T09:11:31.201440Z",
     "iopub.status.idle": "2024-10-11T09:11:31.206333Z",
     "shell.execute_reply": "2024-10-11T09:11:31.205251Z",
     "shell.execute_reply.started": "2024-10-11T09:11:31.201853Z"
    },
    "tags": []
   },
   "source": [
    "## 3.4 使用 KV cache 加速"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "1e716c33-0d88-4ffb-a94b-da122ff99edb",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-10-11T09:12:00.719194Z",
     "iopub.status.busy": "2024-10-11T09:12:00.718878Z",
     "iopub.status.idle": "2024-10-11T09:12:00.731282Z",
     "shell.execute_reply": "2024-10-11T09:12:00.730535Z",
     "shell.execute_reply.started": "2024-10-11T09:12:00.719177Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "prompt = \"Write a very long email apologizing to Sarah for the tragic gardening mishap. Explain how it happened.\"\n",
    "\n",
    "# Tokenize the input prompt\n",
    "input_ids = tokenizer(prompt, return_tensors=\"pt\").input_ids\n",
    "input_ids = input_ids.to(\"cuda\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "265e32f0-e272-4e62-aaa4-4be8a790b14e",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-10-11T09:13:57.178245Z",
     "iopub.status.busy": "2024-10-11T09:13:57.177337Z",
     "iopub.status.idle": "2024-10-11T09:15:40.755634Z",
     "shell.execute_reply": "2024-10-11T09:15:40.754837Z",
     "shell.execute_reply.started": "2024-10-11T09:13:57.178227Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "14.8 s ± 4.58 s per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit -n 1\n",
    "# Generate the text\n",
    "generation_output = model.generate(\n",
    "  input_ids=input_ids,\n",
    "  max_new_tokens=1000,\n",
    "  use_cache=True\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "315e0313-1964-48f2-981a-a84d1da35f34",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-10-11T09:15:40.757424Z",
     "iopub.status.busy": "2024-10-11T09:15:40.757106Z",
     "iopub.status.idle": "2024-10-11T09:17:37.545535Z",
     "shell.execute_reply": "2024-10-11T09:17:37.544849Z",
     "shell.execute_reply.started": "2024-10-11T09:15:40.757395Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "16.7 s ± 5.17 s per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit -n 1\n",
    "# Generate the text\n",
    "generation_output = model.generate(\n",
    "  input_ids=input_ids,\n",
    "  max_new_tokens=1000,\n",
    "  use_cache=False\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f6d4f3f7-3544-46c9-aed7-ea91f9a07ed0",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "hands-on-llm",
   "language": "python",
   "name": "hands-on-llm"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
